A Data-Driven Framework for Intonational Phrase Break Prediction

نویسندگان

  • Manolis Maragoudakis
  • Panagiotis Zervas
  • Nikos Fakotakis
  • George K. Kokkinakis
چکیده

For the present work, we attempt to study the issue of automatic acquisition of intonational phrase breaks. A mathematically well-formed framework is suggested, which is based on Bayesian theory. Based on two different assumptions regarding the conditional independence of the input attributes, we have come up with two Bayesian implementations, namely the Naïve Bayes and the Bayesian Networks classifiers. As a performance benchmark, we evaluated the experimental result against CART, an acclaimed algorithm in the field of intonational phrase break detection that has demonstrated stat-of-the-art figures. Our approach utilizes minimal morphological and syntactic resources in a finite length window, i.e. the POS label and the type of syntactic phrase boundary, a novel attribute that has not been applied to the specific task before. On a 5500 word training set, the Bayesian networks approach proved to be the most effective, depicting precision and recall figures in the range of 82% and 77% respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intonational phrase break prediction using decision tree and n-gram model

In the current study, we propose and evaluate a new method for automatic intonational phrase break prediction based on sequences of parts-of-speech and word junctures. The proposed method uses decision trees to estimate the probability of a word juncture type (break or non-break) given a finite length window of part-of-speech values, and uses an n-gram to model the word juncture sequence. Train...

متن کامل

Syntactic and prosodic parenthesis

This paper examines the view that parentheticals obligatorily form an intonational phrase and break up the intonational phrase of the matrix sentence into two intonational phrases. The analysis of spontaneous speech data of Hamburg German shows that neither do all parentheticals form a distinct intonational phrase nor do all parentheticals break up the intonational phrase of the matrix sentence...

متن کامل

Phrase break prediction using logistic generalized linear model

In this paper we propose a novel phrase break prediction model for Mandarin speech synthesis. It is generalized linear models (GLM) with stepwise regression solution. We assume phrase break obeys Bernoulli distribution and then model phrase break probability by Logistic GLM. The attribute set is automatically selected by stepwise regression, which is a totally data-driven method. We also introd...

متن کامل

Bayesian induction of intonational phrase breaks

For the present paper, a Bayesian probabilistic framework for the task of automatic acquisition of intonational phrase breaks was established. By considering two different conditional independence assumptions, the naïve Bayes and Bayesian networks approaches were regarded and evaluated against the CART algorithm, which has been previously used with success. A finite length window of minimal mor...

متن کامل

Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework

We model Mandarin phrase break prediction as a classification problem with three level prosodic structures and apply conditional maximum entropy classification to this problem. We acquire multiple levels of linguistic knowledge from an annotated corpus to become well-integrated features for maximum entropy framework. Five kinds of features were used to represent various linguistic constraints i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003